-
Notifications
You must be signed in to change notification settings - Fork 3
Bugfix/prod 426 spark log parser error key error submission time #13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bugfix/prod 426 spark log parser error key error submission time #13
Conversation
| with tempfile.TemporaryDirectory() as temp_dir: | ||
| with self.assertRaises(ValueError, msg="Expected DBC event not found"): | ||
| EventLogBuilder(event_log.as_uri(), temp_dir).build() | ||
| self.check_value_error(event_log, "No rollover properties found in log file") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good idea moving this to a function to promote reusability
tests/test_eventlog.py
Outdated
|
|
||
| with open(event_log) as log_fobj: | ||
| event = json.loads(log_fobj.readline()) | ||
| assert event["Event"] == "DBCEventLoggingListenerMetadata", "Expected first event is present" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the assertion error message correct? "Expected first event is present"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, accidentally deleted that when I was having troubles at first (wrong indentation on the assertion). It's back in there now. Neal also suggested turning all the error messages to negatives ("event missing" vs ("event presenet"), which I did throughout.
tests/test_eventlog.py
Outdated
| assert all( | ||
| key in event | ||
| for key in ["Event", "Spark Version", "Timestamp", "Rollover Number", "SparkContext Id"] | ||
| ), "All keys are present" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To clarify, the message states "All keys are present" on an assertion error. Shouldn't it be something like "A key was missing"?
Also, is it important to check for SparkContext Id?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to check for SparkContext Id yet, might be something in the future though. Good catch on the error message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, good call out Neal. In the Java shop I worked at we used the message to describe the assertion like,
assertTrue(5 > 4, "5 is greater the 4");
assertNull("car is null", car);
(looks like the semantics were left to the organization though: https://softwareengineering.stackexchange.com/questions/301652/purpose-of-assertion-messages)
I don't remember the output when they failed, but in Python it looks like something that describes the error is better, e.g.
>>> assert "No rollover properties found in log file" == "Rollover file appears to be missing", "Exception message doesn't match"
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AssertionError: Exception message doesn't match
And the documentation appears to support that: https://docs.python.org/3/reference/simple_stmts.html#grammar-token-python-grammar-assert_stmt
rmoneys
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
spark_log_parser/eventlog.py
Outdated
| self.source_url, self.work_dir, self.s3_client, extract_thresholds | ||
| ) | ||
|
|
||
| def _validate_event_log_paths(self, event_log_paths: Path | str) -> Path: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like list is missing in the type annotations
| event_log_paths = extractor.Extractor(event_log_path.as_uri(), temp_dir).extract() | ||
| eventlog.EventLogBuilder(event_log_paths, temp_dir).build() | ||
|
|
||
| assert str(cm.exception) == msg # , "Exception message matches" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious - why drop the message?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops that was an oversight. Neal also suggested switching the messages to a negative phrasing ("eventlog missing" vs "eventlog present") since its more accurate. I did that throughout.
|
Kudos, SonarCloud Quality Gate passed!
|








Added bypass to StageSubmit events that have no "Submission Time" key.
Increased the (uncompressed) file size limit to 20GB (test log was 11GB Uncompressed).